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SPECIFICATION 
TITLE OF THE INVENTION 
DIGITAL STILL CAMERA AND METHOD OF CONTROLLING OPERATION 
OF SAME 

5 BACKGROUND OF THE INVENTION 

Field of the Invention 

This invention relates to a digital still camera 
and to a method of controlling the operation thereof. 
Description of the Related Art 

10 Digital still cameras capable of recording voice 

data on a recording medium are now well on their way to 
being realized. Such digital still cameras are capable 
of sensing the image of a subject, recording image data 
representing the image of the subject on a memory card 

15 and recording voice data, which represents voice 

contained in sensed image of the subject, on the memory 
card. By reading the image data and voice data that has 
been recorded on the memory card from the memory card, 
voice represented by the voice data can be output while 

20 the image represented by the image data is being 
displayed. 

In order to output voice, however, the device that 
reproduces the image must be equipped with a speaker or 
the like for outputting voice. In the absence of a 
25 speaker, voice cannot be output even if voice data has 
been recorded on the memory card. 

DISCLOSURE OF THE INVENTION 
Accordingly, an object of the present invention is 



to make it possible to ascertain the content of voice 
represented by voice data even if an image playback 
device does not having a function for outputting voice. 

According to the present invention, the foregoing 
object is attained by providing a digital still camera 
having an image sensing device for sensing the image of 
a subject and outputting image data representing the 
image of the subject, and an image recording controller 
for recording image data, which has been output from the 
image sensing device, on a recording medium, the camera 
comprising: a voice input unit for inputting voice and 
outputting voice data representing voice; a voice 
recording controller for recording voice data, which has 
been output from the voice input unit, on the recording 
medium; a character data generating unit for generating 
character data representing voice represented by voice 
data output from the voice input unit; and a character 
recording controller for recording character data, which 
has been generated by the character data generating 
unit, on the recording medium. 

The present invention provides also an operation 
control method suited to the camera described above. 
Specifically, the invention provides a method of 
controlling operation of a digital still camera having 
an image sensing device for sensing the image of a 
subject and outputting image data representing the image 
of the subject, and an image recording controller for 
recording image data, which has been output from the 
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image sensing device, on a recording medium, the method 
comprising the steps of: inputting voice and obtaining 
voice data representing voice; recording obtained voice 
data on the recording medium; generating character data 
5 representing voice represented by obtained voice data; 
and recording generated character data on the recording 
medium. 

In accordance with the present invention, the image 
of a subject is sensed and image data representing the 

10 image of the subject is recorded on a recording medium. 
Further, voice is input and data representing voice is 
recorded on the recording medium. Furthermore, 
character data (character codes) representing this voice 
is generated. The generated character data also is 

15 recorded on the recording medium. 

When an image is reproduced, image data that has 
been recorded on the recording medium is read from the 
medium and an image represented by the read image data 
is displayed. Character data also is read from the 

20 recording medium and characters represented by the 

character data can be displayed on the image. Thus, the 
content of voice can be ascertained even with an image 
playback device that does not possess a function such as 
a voice playback function. As a result, the atmosphere 

25 represented by voice at the time the image was captured 
can be grasped even with an image playback device that 
does not possess a function such as a voice playback 
function . 
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Of course, in case of an image playback device 
having a voice playback function, voice data would be 
read from the recording medium and voice representing 
the read voice data would be output. 
5 The voice input unit may be one which inputs voice 

during the sensing of the image of a subject by the 
image sensing device. In this case, the camera would 
further comprise a first control unit for controlling 
the image recording controller, the voice recording 
10 controller and the character recording controller in 

such a manner that at least two types of the data among 
the image data, voice data and character data will be 
recorded on the recording medium in a form linked to 
each other. 

!5 Thus, mutually linked data can be found 

immediately . 

The camera may further comprise: a first reading 
unit for reading image data and character data that has 
been recorded on the recording medium; a first combining 

20 unit for combining the characters, which are represented 
by the character data, with an image displayed by the 
image data that has been read by the first reading unit; 
and a first display unit for displaying the image with 
which the characters have been combined by the first 

25 combining unit. 

Thus, characters represented by the voice data can 
be displayed without providing an image playback device 
separate from the digital still camera. 
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The camera may further comprise a second combining 
unit for combining characters, which are represented by- 
character data that has been generated by the character 
data generating unit, with an image output from the 
5 image sensing device; and a second control unit for 
controlling the image recording controller and the 
character recording controller in such a manner that 
image data representing an image with which characters 
have been combined by the second combining unit will be 
10 recorded on' the recording medium. 

The camera may further comprise: a determination 
unit for determining whether the digital still camera 
has a voice output unit when playback is performed; a 
second control unit, responsive to a determination by 
15 the determination unit to the effect that the camera has 
a voice output unit, for outputting voice, which is 
represented by the voice data, from the voice output 
unit and halting display of characters represented by 
the character data; and a third control unit, responsive 
20 to a determination by the determination unit to the 
effect that the camera does not have a voice output 
unit, for controlling a display unit so as to display 
the characters represented by the character data. 

Since characters are not displayed when voice can 
25 be output, the characters will not be superimposed on an 
image. 

The camera may further comprise a second reading 
unit for reading character data that has been recorded 
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on the recording medium; a second display unit for 
displaying characters represented by character data that 
has been read by the second reading unit; and an erasure 
control unit responsive to an erase command for erasing 
5 voice data, which corresponds to characters being 

displayed on the second display unit, from the recording 
medium. 

The content of voice corresponding to characters 
can be ascertained by viewing the characters. Thus a 
10 user can decide whether or not to erase voice data 
without listening to the voice. 

The image recording controller may record image 
data, which has been output from the image sensing 
device, in response to input of predetermined voice to 
15 the voice input unit. 

Thus, a command for recording image data can be 
applied by inputting predetermined voice. 

Thus, image data representing an image with which 
characters have been combined can be recorded on the 
20 recording medium. Even if the image playback unit is 
not equipped with a circuit for combining an image and 
characters, an image with which characters have been 
combined can be displayed at the time of image playback. 
The camera may further comprise a third reading 
25 unit for reading image data, which represents an image 
with which characters have been combined, from the 
recording medium; and a second display unit for 
displaying an image represented by image data that has 
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been read by the third reading unit. 

Thus, an image with which characters have been 
combined can be displayed without providing an image 
playback device separate from the digital still camera. 
5 Other features and advantages of the present 

invention will be apparent from the following 
description taken in conjunction with the accompanying 
drawings, in which like reference characters designate 
the same or similar parts throughout the figures 
10 thereof. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. 1 is a block diagram showing the electrical 
construction of a digital still camera according to an 
embodiment of the present invention; 
15 Fig. 2 is a diagram showing the data structure of a 

memory card according to this embodiment; 

Fig. 3 is a flowchart illustrating processing 
executed at the time of photography according to this 
embodiment ; 

20 Fig. 4 is a flowchart illustrating processing 

executed at the time of playback according to this 
embodiment ; 

Fig. 5 shows an example of a reproduced image; 
Fig. 6 is a block diagram showing the electrical 
25 construction of a digital still camera according to 
another embodiment of the present invention; 

Fig. 7 is a diagram showing the data structure of a 
memory card according to this embodiment; 



Fig. 8 is a flowchart illustrating processing 
executed at the time of photography according to this 
embodiment ; and 

Figs. 9 and 10 are flowcharts illustrating 
processing executed at the time of playback according to 
this embodiment. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Preferred embodiments of the present invention will 
now be described in detail with reference to the 
drawings . 

Fig. 1 is a block diagram showing the electrical 
construction of a digital still camera according to an 
embodiment of the present invention. 

The overall operation of the digital still camera 
is controlled by a control circuit 20. 

The digital still camera includes a shutter-release 
button 21 which, when pressed, applies a signal 
indicative thereto to the control circuit 20. 

The digital still camera further includes a mode 
setting switch 22. The latter makes it possible to set 
various modes, such as an imaging mode, voice recording 
mode, telop (television opaque projector) recording mode 
and playback mode. A signal representing the set mode 
is input to the control circuit 20. 

The digital still camera is further provided with a 
voice erasure function the details of which will be 
described later. A voice erase command from a voice 
erase switch 23 also is input to the control circuit 20. 
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In the imaging mode, the image of a subject is 
sensed and the shutter-release button 21 is pressed, 
whereby image data representing the image of the subject 
is recorded on a memory card 30. The voice recording 
5 mode is for recording voice data, which represents 

voice, on the memory card 30 together with image data. 
In the telop recording mode, data representing 
characters represented by the voice data is recorded on 
the memory card 30 together with the image data and 
10 voice data." The playback mode is for reproducing an 
image represented by the image data that has been 
recorded on the recording medium. 

Voice is input by a microphone 1 and a voice signal 
representing voice is output. The voice signal is input 
15 to a voice recognition circuit 2 and voice signal 
processing circuit 5. 

The voice recognition circuit 2 includes an 
analog/digital converter for converting the input analog 
voice signal to digital voice data. Characters 
20 representing voice input to the microphone 1 are 

recognized from the digital voice data obtained by the 
conversion. Character codes (text code) representing 
the recognized characters are generated in the voice 
recognition circuit 2. The generated character codes 
25 are applied to a buffer memory 3, where the codes are 
stored temporarily. 

The character codes are read out of the buffer 
memory 3 and applied to an image conversion circuit 4. 
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The latter subjects the character codes to image-data 
conversion processing for expressing, in the form of an 
image , the characters represented by the character 
codes. Data representing characters expressed in image 
5 form shall be referred to as character data. 

Character data output from the image conversion 
circuit 4 is applied to a recorded-data selection 
circuit 14 via a character data input circuit 11 
included in a recording controller 10. 

10 The voice signal that has been input to the voice 

signal processing circuit 5 is subjected to 
predetermined voice signal processing such as noise 
removal processing. The voice signal processing circuit 
5 also includes an analog/digital converter for 

15 converting the analog voice signal to digital voice 
data. 

The digital voice data obtained by the conversion 
is applied to the recorded-data selection circuit 14 via 
a voice data input circuit 12. 

20 The image of a subject is formed on the 

photoreceptor surface of a CCD 7 by an imaging lens 6 . 
A video signal representing the image of the subject is 
output from the CCD 7 and input to a video signal 
processing circuit 8. The latter subjects the video 

25 signal to predetermined video signal processing such as 
gamma correction processing, color-balance adjustment 
processing and analog/digital signal conversion 
processing. 



Digital image data representing the image of the 
subject output from the video signal processing circuit 
8 is input to the recorded-data selection circuit 14 via 
an image data input circuit 13 included in the recording 
controller 10. 

The recorded-data selection circuit 14 selects and 
outputs the applied character data, voice data or image 
data. The data output from the recorded-data selection 
circuit 14 is applied to a file information setting 
circuit 15, where the data is subjected to processing 
that generates link data for linking the voice data and 
image data (e.g., as by using file names that are 
partially identical). The data output from the file 
information setting circuit 15 is then recorded on the 
memory card 30 under the control of a memory control 
circuit 16. 

Fig. 2 illustrates the data structure of the memory 
card 30. 

The memory card 30 includes a header recording area 
for recording management data, an image data recording 
area for recording image data, a character data 
recording area for recording character data and a voice 
data recording area for recording voice data. 

Image data obtained by imaging is recorded in the 
image data recording area of the memory card 30 by the 
memory control circuit 16. Further, character data, 
which represents the content of voice by characters 
obtained based upon voice recognition processing, is 



recorded in the character data recording area. 
Furthermore, voice data is recorded in the voice data 
recording area. 

With reference again to Fig. 1, the playback mode 
is such that image data that has been recorded on the 
memory card 30 is applied to an image data processing 
circuit 31, character data that has been recorded is 
applied to a character data processing circuit 32 and 
voice data that has been recorded is applied to a voice 
data processing circuit 33. 

The image data processing circuit 31 subjects the 
data that has been read from the memory card 30 to 
predetermined image processing such as format conversion 
processing that is suited to a display unit 35 . The 
character data processing circuit 32 subjects the 
character data to predetermined character processing 
such as format conversion processing suited to the 
display unit 35. Further, the voice data processing 
circuit 33 subjects the voice data to predetermined 
processing such as format conversion processing suited 
to output from a speaker 36. 

The image data output from the image data 
processing circuit 31 and the character data output from 
the character data processing circuit 32 is applied to 
an image combining processing circuit 34. The latter 
subjects the image data and character data to combining 
processing in such a manner that characters represented 
by the character data will be displayed on the image 
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represented by the image data. By applying the combined 
image data to the display unit 35, the image with which 
the characters have been combined will be displayed on 
the display screen of the display unit 35. 
5 Further, by applying the voice data output from the 

voice data processing circuit 33 to the speaker 36, 
voice represented by the voice data will be output. 

Fig. 3 is a flowchart illustrating processing 
executed when the digital still camera performs 
10 photography. 

Whether or not the voice recording mode has been 
set by the mode setting switch 22 is checked (step 41). 
If the voice recording mode has not been set ("NO" at 
step 41), it is considered that the camera has been set 
15 merely to the photography mode. If the shutter-release 
button 21 is pressed, image data obtained as a result of 
imaging a subject by the CCD 7 will be recorded in the 
image data recording area of the memory card 30 (step 
45). If the voice recording mode has not been set, then 
20 voice data is not recorded on the memory card 30. 

If the voice recording mode has been set ("YES" at 
step 41), then whether the telop recording mode has been 
set is checked (step 42). If the telop recording mode 
has been set ("YES" at step 42), then, in response to 
25 depression of the shutter-release button 21, the image 
of the subject is sensed by the CCD 7 and image data 
representing the image of the subject is obtained and, 
moreover, input of voice by the microphone 1 starts. 
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Input of voice is performed for a fixed period of time 
starting from depression of the shutter-release button 
21. 

Voice data representing voice is obtained, in the 
5 manner set forth above, from the voice signal output by 
the microphone 1. In the telop recording mode, 
character data representing characters which indicate 
the content of voice represented by the voice signal 
also is generated. 
!0 Thus, "in the telop recording mode, image data 

representing the image of a subject, voice data 
representing voice and character data for representing 
the content of voice by characters are obtained. These 
items of image data, voice data and character data are 

15 selected successively by the recorded-data selection 
circuit 14 and recorded on the memory card 30. The 
obtained items of image data, voice data and character 
data are recorded in the image data recording area, 
voice data recording area and character data recording 

20 area, respectively, of the memory card 30 (step 44). It 
goes without saying that data indicating the 
corresponding relationship among the corresponding items 
of image data, voice data and character data is recorded 
in the header area of the memory card 30, as described 

25 above. 

If the voice recording mode has been set but the 
telop recording mode has not ( "NO" at step 42), voice is 
input by the microphone 1 but voice recognition 
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processing by the voice recognition circuit 2 is not 
executed. Accordingly, character data representing 
characters indicative of the content of voice is not 
obtained. Image data representing the image of the 
5 subject is recorded in the image data recording area of 
memory card 30 and voice data is recorded in the voice 
data recording area of memory card 30 (step 43). 

In the embodiment set forth above, image data or 
the like is recorded on the memory card 30 in response 
10 to depression of the shutter-release button 21. 

However, an arrangement may be adopted in which image 
data or the like is recorded on the memory card 30 in 
response to input of predetermined voice to the 
microphone 1. In this case, voice data representing 
15 voice that triggers recording of image data would be 

stored in a prescribed memory beforehand and image data 
would be recorded on the memory card 30 in response to a 
match between voice data representing entered voice and 
the voice data that has been stored. 
20 Fig. 4 is a flowchart illustrating processing 

executed by the digital still camera at the time of 
playback. 

Image data is read out of the memory card 30 loaded 
in the digital still camera, and is given to the display 
25 unit 35 via the image data processing circuit 31 and 
image combining processing circuit 34. The image 
represented by the image data that has been read out is 
displayed on the display screen of the display unit 35. 
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While observing the image displayed on the display 
screen of the display unit 35, the user selects an image 
to be reproduced [it goes without saying that the 
digital still camera is provided with a frame selection 
5 switch (not shown) or the like for selecting a playback 
image] (step 51) . 

On the basis of the link data that has been 
recorded in the header recording area of the memory card 
30, it is determined whether voice data corresponding to 

10 the image data representing the selected image has been 
recorded in the voice data recording area of the memory 
card 30 (step 52) . 

If voice data corresponding to the selected image 
data has not been recorded on the memory card 30 ( "NO" 

15 at step 52 ) , then it is construed that the selected 

image data was captured by the simple image mode. The 
image represented by the selected image data is 
displayed on the display screen of the display unit 35 
(step 56) without output of voice. 

20 If voice data corresponding to the selected image 

data has been recorded on the memory card 30 ( " YES" at 
step 52), then it is determined whether character data 
corresponding to the image data has been recorded on the 
memory card 30 (step 53). 

25 If both voice data and character data corresponding 

to the image data has been recorded on the memory card 
30 ( "YES" at both steps 52 and 53), then the voice data 
and character data corresponding to the selected image 
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data is read out of the memory card 30. The items of 
image data, character data and voice data that have been 
read out are applied to the image data processing 
circuit 31, character data processing circuit 32 and 
5 voice data processing circuit 33, respectively. As 

described above, various processing is executed and the 
items of image data and voice data are combined in the 
image combining processing circuit 34. The image data 
with which the character data has been combined is 

10 applied to the display unit 35. As a result, an image 

combined with telop characters 37, which are represented 
by the character data, is displayed on the display 
screen of the display unit 35, as shown in Fig. 5. 
Further, voice data is applied to the speaker 36 in 

15 conformity with the display of the image so that voice 
conforming to the telop characters 37 is output (step 
55). 

If there is no character data corresponding to 
image data ("NO" at step 53), image data that has been 

20 read out of the memory card 30 is applied to the display 
unit 35 so that the image is displayed. Since there is 
no character data corresponding to the read image data, 
telop characters are not displayed. Since there is 
voice data corresponding to the read image data, voice 

25 represented by this voice data is output from the 
speaker 36 (step 54). 

Furthermore, it is determined whether a voice erase 
command has been applied by the voice erase switch 23 
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(step 57). In a case where telop characters are being 
displayed when a voice erase command is applied ( "YES" 
at step 57 ) , voice data representing voice corresponding 
to these telop characters is erased from the memory card 
5 30 (step 58). The content of voice can be checked by 
observing the telop characters. Unnecessary voice can 
be erased from the memory card 30 without listening to 
it. 

In the embodiment described above, the digital 

10 still camera is provided with the speaker 36 and 

therefore voice represented by voice data is output. It 
goes without saying, however, that voice will not be 
output if the speaker 36 has not been provided. Since 
telop characters indicating the content of voice are 

15 displayed on the image even if the playback device is 
not provided with a speaker, it is still possible to 
ascertain the content of voice. 

Further, in the embodiment described above, 
character data representing characters in the form of an 

20 image has been recorded on the memory card 30. However, 
character codes may be recorded on the memory card 30. 

Figs. 6 to 9 illustrate another embodiment of the 
present invention. According to the above-described 
embodiment, character data is combined with image data 

25 when an image is reproduced. With the embodiment shown 
in Figs. 6 to 9, however, image data is combined with 
character data at the time of recording and the image 
data with which the character data has been combined is 
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recorded on the memory card 30. 

Fig. 6 is a block diagram showing the electrical 
construction of the digital still camera according to 
this embodiment. Components identical with those shown 
5 in Fig. 1 are designated by like reference characters 
and need not be described again. Fig. 7 illustrates the 
data structure of the memory card 30. 

Fig. 8 is a flowchart illustrating processing 
executed when photography is performed using the digital 

10 still camera performs shown in Fig. 6, and Fig. 9 is a 

flowchart illustrating processing executed when playback 
is performed using the digital still camera shown in 
Fig. 6. Processing steps identical with those shown in 
Figs. 3 and 4 are designated by like step numbers and 

15 need not be described again. 

Items of image data, voice data and character data 
are obtained in the telop recording mode ( "YES" at step 
42 in Fig. 8) in a manner similar to that of the above- 
described embodiment. These items of image data, voice 

20 data and character data are applied to a data combining 
circuit 24. The latter combines the character data with 
the image data, whereby there is obtained image data 
representing an image with which telop characters have 
been combined (step 46 in Fig. 8). 

25 The memory card 30 has the header recording area, 

image data recording area and voice data recording area. 

Image data with which the character data has been 
combined is recorded in the image data recording area of 
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memory card 30. Further, voice data is recorded in the 
voice data recording area (step 47 in Fig. 8). Thus, 
voice data alone is not recorded on the memory card 30. 
Image data with which character data has thus been 
5 combined is read out of the memory card 30 and applied 
to the display unit 35 via the character data processing 
circuit 32. At playback, the image with which telop 
characters have been combined can be displayed on the 
display screen of the display unit 35 (steps 55, 56A in 

10 Fig. 9) without executing processing for combining the 
character data with the image data. Further, it goes 
without saying that if voice data is present, then voice 
is output by applying the voice data to the speaker 36 
(step 55 in Fig. 9). 

15 This embodiment is useful when a device exclusively 

for playback does not have an image combining function. 
That is, when the playback device does not have an image 
combining function, telop characters cannot be combined 
with and displayed on an image. In this embodiment, 

20 however, image data with which telop characters have 
already been combined is produced in advance and 
recorded on the memory card 30. At playback, image 
combining processing is unnecessary. This means that 
image data with which telop characters have been 

25 combined can be displayed even with a playback device 
not having an image combining function. 

Though each of the above-described circuits is 
implemented by hardware, some or all of these circuits 
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may be implemented by software. 

Fig. 10 is a flowchart illustrating processing 
executed when playback is performed by this digital 
still camera. 

5 In a manner similar to that described above, a 

playback image is selected (step 61). It is then 
determined whether the digital still camera has a 
speaker (step 62). 

If the camera has a speaker ("YES" at step 62), 

10 voice is output from the speaker and an image without 

telop characters is displayed (step 63). If the camera 
does not have a speaker ("NO" at step 62), voice output 
is halted and an image with telop characters is 
displayed (step 64). When the camera has a speaker, 

15 telop characters are not displayed. This means that 

telop characters will not interfere with viewing of the 
image. 

As many apparently widely different embodiments of 
the present invention can be made without departing from 
20 the spirit and scope thereof, it is to be understood 
that the invention is not limited to the specific 
embodiments thereof except as defined in the appended 
claims. 



